Khmaladze transformation

The Khmaladze Transformation is a statistical tool.

Consider the sequence of empirical distribution functions F_n based on a sequence of i.i.d random variables, X_1,\ldots, X_n, as n increases. Suppose F is the hypothetical distribution function of each X_i. To test whether the choice of F is correct or not, statisticians use the normalized difference,

v_n(x)=\sqrt{n} [F_n(x)-F(x)].

This v_n, as a random process in x, is called the empirical process. Various functionals of v_n are used as test statistics. The change of the variable v_n(x)=u_n(t), , t=F(x) transforms to the so-called uniform empirical process u_n. The latter is an empirical processes based on independent random variables U_i=F(X_i), which are uniformly distributed on [0,1] if the X_is do indeed have distribution function F.

This fact was discovered and first utilized by Kolmogorov(1933), Wald and Wolfowitz(1936) and Smirnov(1937) and, especially after Doob(1949) and Anderson and Darling(1952), it led to the standard rule to choose test statistics based on v_n. That is, test statistics \psi(v_n,F) are defined (which possibly depend on the F being tested) in such a way that there exists another statistic \varphi(u_n) derived from the uniform empirical process, such that \psi(v_n,F)=\varphi(u_n). Examples are

\sup_x|v_n(x)|=\sup_t|u_n(t)|,\quad \sup_x\frac{|v_n(x)|}{a(F(x))}=\sup_t\frac {|u_n(t)|}{a(t)}

and

\int_{-\infty}^{\infty} v_n^2(x)d F(x)=\int_{0}^{1} u_n^2(t)\,dt.

For all such functionals, their null distribution (under the hypothetical F) does not depend on F, and can be calculated once and then used to test any F.

However, it is only rarely that one needs to test a simple hypothesis, when a fixed F as a hypothesis is given. Much more often, one needs to verify parametric hypotheses where the hypothetical F=F_{\theta_n}, depends on some parameters \theta_n, which the hypothesis does not specify and which have to be estimated from the sample X_1,\ldots,X_n itself.

Although the estimators \hat \theta_n, most commonly converge to true value of \theta, it was discovered (Kac, Kiefer and Wolfowitz(1955) and Gikhman(1954)) that the parametric, or estimated, empirical process

\hat v_n(x)=\sqrt{n} [F_n(x)-F_{\hat\theta_n}(x)]

differs significantly from v_n and that the transformed process \hat u_n(t)=\hat v_n(x), t=F_{\hat\theta_n}(x) has a distribution for which the limit distribution, as n\to\infty, is dependent on the parametric form of F_{\theta} and on the particular estimator \hat\theta_n and, in general, within one parametric family, on the value of \theta.

From mid-50's to the late-80's, much work was done to clarify the situation and understand the nature of the process \hat v_n.

In 1981, and then 1987 and 1993, E. V. Khmaladze suggested to replace the parametric empirical process \hat v_n by its martingale part w_n only.

\hat v_n(x)-K_n(x)=w_n(x)

where K_n(x) is the compensator of \hat v_n(x). Then the following properties of w_n were established:

\omega_n(t)=w_n(x), t=F_{\hat \theta_n}(x)

is that of standard Brownian motion on [0,1], i.e., is again standard and independent of the choice of F_{\hat\theta_n}.

For a long time the transformation was, although known, still not used. Later, the work of researchers like R. Koenker, W. Stute, J. Bai, H. Koul, A. Koening, ... and others made it popular in econometrics and other fields of statistics.

See also

References

Khmaladze, E.V. (1981) "Martingale approach in the theory of goodness-of-fit tests." Theor. Prob. Appl., 26, 240–257.